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Abstract 

Several statistics used by physicists to declare the signal observ- 
ability over the background are compared. It is shown that the fre- 
quentist method of testing a precise hypothesis allows one to estimate 
the power value of criteria with specified level of significance for the 
considered statistics by Monte Carlo calculations. The application of 
this approach for the analysis of discovery potential of experiments is 
discussed. 
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Introduction 



One of the common tasks for searching experiments is the detection of a 
predicted new Phenomenon. As a rule the estimations of an expected mean 
N s for the signal events of new Phenomenon and N b for the background 
events are known. Then we want to know is the given experiment able to 
detect new Phenomenon or not. To check the statement about the observa- 
tion of Phenomenon a researcher uses some function of the observed number 
of events - a statistic. The value of this statistic for detected x events allows 
one to find the degree of confidence of the discovery statement. After having 
drawn a conclusion on the observation of Phenomenon, two possibilities for 
mistake are available: to state that Phenomenon is absent but in real life 
it exists (Type I error), or to state that Phenomenon exists but it is absent 
(Type II error). 

In this paper we compare the "signal significances" used by the re- 
searchers for the hypothesis testing about the observation of Phenomenon: 

(a) "significance" Si = " ' 



(b) "significance" S 2 = Ns g |, 

y/J\ s + l\ b 



(c) "significance" S u = y/N s + N b - ^N b §], 

(d) likelihood ratio as is defined in references || g]. 

For this purpose we formulate the null and alternative hypotheses, con- 
struct the statistical test, determine the rejection region by Monte Carlo 
calculations, make the decision and find the power of test for the criteria 
with a specified level of significance. We also use an equal-tailed test to 
study the behaviour of Type I and Type II errors versus N s and N b for 
specified values of Si and S^- The hypotheses testing results obtained by 
Monte Carlo calculations are compared with result obtained by the direct 
calculations of probability density functions. 

1 Notations 

Let us study a physical process during a fixed time. The estimations of 
the average number of signal events which indicate new Phenomenon (N s ) 
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and of the average number of background events (N b ) in the experiment are 
given. We suppose that the events have the Poisson distributions with the 
parameters N s and Nb, i.e. the random variable £ ~ Pois(N s ) describes 
the signal events and the random variable r] ~ Pois(N b ) describes the back- 
ground events. Say we observed x events - the realization of the studying 
process X = £ + r/ (x is the sum of signal and background events in the ex- 
periment). Here N s , N b are non-negative real numbers and x is an integer. 
The classical frequentist methods of testing a precise hypothesis allow one 
to construct a rejection region and determine associated error probabilities 
for the following "simple" hypotheses: 

H : X ~ Pois(N s + N b ) versus i?i : X ~ Pois(N b ), where Pois(N s + 
N b ) and Pois(N b ) have the probability density functions (p.d.f.'s) 

/o(x ) = (x a + N b ) x e _ (Na+Nb) forthecaseofpresenceand/l(x) = W¥_ e -m 

Xl XI 

for the case of absence of signal events in the universe population. 

In Fig.l the p.d.f.'s f {x) (a) and f x (x) (b) for the case N s + N b = 104 
and N b = 53 ([[|, Table. 13, cut 6) are shown. As is seen the intersection of 
these p.d.f.'s takes place. Let us denote the threshold (critical value) that 
divides the abscissa in Fig.l into the rejection region and the area of accepted 
hypothesis Hq via N ev . The incorrect rejection of the null hypothesis Hq, the 
Type I error (the statement that Phenomenon is absent, but it is present), 

has the probability a = fo( x ), an d the incorrect acceptance of Hq, the 

x=0 

Type II error (the statement that Phenomenon exists, but it is absent), has 

oo 

the probability (3 = fi{x). The dependence of a and (3 on the value 

X=Nev+l 

of ./Ve,, for above example is presented in Fig. 2. 



2 Hypothesis testing 

In this Section we show the procedure of the rejection region construction 
for the likelihood ratio M. 

f (x) 

We denote by B(x) = the likelihood ratio of Hq to Hi in the area 

of existing B(X). The decision to either reject or accept Hq will depend on 
the observed value of B(x), where small values of B{x) correspond to the 
rejection of Hq. For the traditional frequentist the classical most powerful 
test of the simple hypothesis is determined by some critical value c such that 
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if B(x) < c, reject Hq, 

if B(x) > c, accept Hq. 

In compliance with this test, the frequentist reports Type I and Type II 

error probabilities as a = Pq(B(X) < c) = F (c) and (3 = Pi(B{X) > c) = 

1 — -Fi(c), where Fq and F\ are cumulative density functions of B(X) under 

Hq and Hi, respectively. For a conventional equal-tailed test with a = (3, 

the critical value c satisfies Fq(c) = 1 — i*i(c). 

In the same way we can construct the rejection region, find the critical 

x — Nb 

values ci, C2 and cyi, the probabilities a and (3 for the statistics si = 



x — Nh 

(for "significance" Si), S2 = p= — (for "significance" S2) and s\2 = \fx - 



Nb (for "significance" S12). Here, the value of x — Nb is the estimation 
of the number of signal events. Note that "significance" S12 depends on Si 

Si ■ So 

and S 2 , namely, Si 2 = - — — ^- 0. 

>i>i + J2 

3 Determination of probability density functions 
for statistics 

The probability density functions of statistics under consideration can be 
obtained in an analytical form. Another way to obtain the p.d.f. is the 
calculations by a Monte Carlo simulation of the results of a large number 
of experiments (see as an example [Jj], ||, ||) for the given values N s and Nb. 
In this study we use the latter approach. The p.d.f. 's for N s + Nb = 104 
and Nb = 53 obtained by this way are shown in Fig. 3 (these distributions 
are the result of 10 5 simulation experiments for random variables £ and 
rj). The difference between these p.d.f.'s and p.d.f.'s resulting from direct 
calculations of the probabilities (Fig.l) is extremely small. 

In Fig. 4 the p.d.f.'s of statistic S2 for the case of N s = 51, Nb = 53 (a) and 
the case of N s = 0, Nb = 53 (b) are shown. The behaviour of probabilities 
a and (3 versus the critical value C2 for the statistic S2 is also presented in 
Fig.4 (c). 

It is worth to stress that this approach allows one to construct the p.d.f.'s 
and, correspondingly, the acceptance and the rejection regions for compli- 
cated statistics with account for the systematic errors and the uncertainties 
in Nb and N s estimations. 
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4 Comparison of different statistics 



We compare the statistic s%, the statistic S2, the statistic su and the like- 
lihood ratio {B(x — N^) in our case). The reason for the comparison is the 
existence of a opinion that the value of such type statistic (s\, s%, S12) char- 
acterizes the difference between the samples with and without signal events 
in terms of "standard deviations" (1 cr, 2a,..., 5 <r) Q. To anticipate a 
little, the values of a and (3 corresponding to these "standard deviations" 
depend on the value of the sample and for Si, for example, a and (3 have a 
perceptible value even if N s and 2V& satisfy the condition Si = 5. 

The Type I error a is also called a significance level of the test. The 
value for (3 is meaningful only when it is related to an alternative hypothesis 
Hi. The dependence 1 — (3 is referred to as a power function that allows 
one to choose a preferable statistic for the hypothesis testing. It means 
that for the specified significance level we can determine the critical value 
c (correspondingly, ci, ci, cyi) and find the power 1 — (3 of this criterion. 
The greater the value 1 — (3, the better statistic separates hypotheses for the 
specified value of a. 

In Table 1 the comparison result is shown. For several values of N s and 
Ni, (significance level a = 0.01) the critical values ci, C2, C12, c and the 
corresponding values of power 1 — /3 of these criteria for the statistics si, s%, 
S12 and the likelihood ratio are presented. As is seen from Table I there is 
no visible difference in the power values for the considered statistics, i.e. we 
can use in an equivalent manner either of these statistics for the hypotheses 
testing. 



5 Equal-tailed test 

Of concern to us is the question: What is meant by the statement that 
01 = , — = 5 or Do = = = 5 : 



Tables 2 and 3 give the answer to this question. In Tables 2 and 3 the 
values N s and Nf, corresponding to the above condition, the values a and (3 
determined by applying equal-tailed test (in this study we use the conditions 

1 \i fi(x) is the standard normal distribution, then the 1 a deviation from corresponds 
the area of tail that is equal to 0.1587, 2 a - 0.0228, 3 a - 0.00135, 4 a - 0.000032 and 
5 a - 0.000003. 

2 The conditions ram(0.01 — a) and a < 0.01 are performed. 
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Table 1: The comparison of power of criteria for different statistics. The 
values ci, C2, c\i and c are the critical values of statistics s±, S2, S12 and like- 
lihood ratio for a = 0.01. The values 1 — (5 are the power for corresponding 
critical values. 





statistic: 


Sl 




S12 


likelihood 


ratio 


N s 




Cl 


1-/3 




1-/3 


C12 


1-/3 


c 


1-/3 


10 


5 




0.89 


0.762 


0.75 


0.762 


0.3 


0.762 


0.035 


0.760 


15 






2.23 


0.968 


1.58 


0.968 


0.8 


0.968 


0.078 


0.968 


20 






4.02 


0.999 


2.40 


0.999 


1.4 


0.999 


2.563 


0.999 


25 






5.81 


1.000 


3.06 


1.000 


1.9 


1.000 


110.0 


1.000 


15 


10 




1.26 


0.864 


1.06 


0.866 


0.4 


0.865 


0.045 


0.864 


20 






2.52 


0.986 


1.88 


0.986 


0.9 


0.985 


0.269 


0.986 


25 






3.79 


0.999 


2.55 


0.999 


1.4 


0.999 


3.939 


0.999 


30 






5.05 


1.000 


3.13 


1.000 


1.8 


1.000 


307.0 


1.000 


15 


15 




0.77 


0.750 


0.70 


0.747 


0.2 


0.750 


0.040 


0.749 


20 






1.80 


0.947 


1.49 


0.947 


0.7 


0.948 


0.117 


0.947 


25 






2.84 


0.994 


2.15 


0.994 


1.1 


0.994 


0.667 


0.994 


30 






3.87 


0.999 


2.73 


1.000 


1.5 


1.000 


7.795 


1.000 


20 


55 




0.13 


0.535 


0.00 


0.479 


-0.1 


0.483 


0.052 


0.536 


25 






0.67 


0.733 


0.64 


0.733 


0.2 


0.735 


0.049 


0.731 


30 






1.21 


0.873 


1.12 


0.874 


0.4 


0.843 


0.074 


0.873 


35 






1.88 


0.963 


1.68 


0.962 


0.7 


0.950 


0.231 


0.962 


40 






2.42 


0.989 


2.10 


0.988 


1.0 


0.988 


0.512 


0.989 


45 






2.96 


0.997 


2.60 


0.998 


1.3 


0.998 


2.894 


0.998 


50 






3.64 


1.000 


2.98 


1.000 


1.5 


1.000 


9.957 


1.000 



min((3 — a) and a < (5) are presented. One can see the dependence of a (or 
(3) on the value of sample. The case of N s = 5 and N b = 1 for Si (Fig.5) is 
perhaps the most dramatic example. We have 5a deviation, however, if we 
reject the hypothesis Hq, we are mistaken in 6.2% of cases and if we accept 
the hypothesis Hq we are mistaken in 8.0% of cases. 

One can point out that for a good deal of events the values of a for S± 
and S2 approach each other. A simple argument explains such dependence. 
The x — N b has the variation equal to \J N s + N b for nonzero signal events, 
and to y/Nb if signal events are absent. Correspondingly, if N b S> N s , 
the contribution of N s to the variation is very small. Therefore, the stan- 
dard deviation tends to unity both for the distribution of si (Fig. 6) and for 
the distribution of S2- It means that for the sufficiently large N b , the val- 
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Table 2: The dependence of a and (5 determined by using equal-tailed test 
on N s and Nb for Si = 5. The k is the area of intersection of probability 
density functions fo(x) and fi(x). 



1 v S 


Nh 

1 v 


a 


8 


K 


5 


1 


n 

u 


nfi9n 
uuzu 


0803 


n 
u 


1 zL9 c ! 


10 


4 


n 
u 


n^i fi 


0.0511 


n 
u 


uozo 


15 


9 


n 
u 


uiyo 


0.0415 


n 


UtJU^ 


20 


16 


n 
u 


m 41 


0.0367 


n 




25 


25 


n 
u 


01 fi9 


0225 


n 


uooo 


30 


36 


() 


0125 


0225 


() 


0333 


35 


49 





0139 


0.0164 





0303 


40 


64 





0114 


0.0171 





0278 


45 


81 





0124 


0.0136 





0260 


50 


100 





0106 


0.0143 





0245 


55 


121 





0114 


0.0120 





0234 


60 


144 





0100 


0.0126 





0224 


65 


169 





0106 


0.0109 





0216 


70 


196 





0095 


0.0115 





0209 


75 


225 





0101 


0.0102 





0203 


80 


256 





0091 


0.0107 





0198 


85 


289 





0096 


0.0097 





0193 


90 


324 





0088 


0.0101 





0189 


95 


361 





0081 


0.0106 





0185 


100 


400 





0086 


0.0097 





0182 


150 


900 





0078 


0.0084 





0162 


500 


10 4 





0068 


0.0068 





0136 


5000 


10 6 





0062 


0.0065 





0125 



ues of a and (5 obtained by equal-tailed test have a constant value close to 
0.0062. These distributions also can be approximated by a standard Gaus- 
sian 

AA(O,1)0 for the pure background and Gaussian JV(5, 1) for the signal 
mixed with the background. Therefore, the equal-tailed test for the normal 
distributions gives c\ = 2.5 and a = (5 = 0.0062. These are the limiting 
values of a and f3 for the requirement S\ = 5 or S2 = 5 (by the way Su 
equals 2.5 in this case). 

In a similar way we can determine the behaviour of the Type I and Type 
II errors depending on N s and iVj, for a small number of events and we can 

3 It is a conventional notation for normal distribution A/"(mean,variance). 
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Table 3: The dependence of a and (5 determined by using equal-tailed test 
on N s and Nb for S2 ~ 5. The k is the area of intersection of probability 
density functions fo(x) and fi(x). 



N s 


N b 


a 


/? 


K 


26 


1 


0.519 


10" 


-b 


0.102 


10" 


-4 


0.154 


10- 


4 


29 


4 


0.661 


10- 


-4 


0.764 


10" 


-4 


0.142 


10- 


3 


33 


9 


0.127 


10" 


-3 


0.439 


10- 


-3 


0.440 


10- 


3 


37 


16 


0.426 


10" 


3 


0.567 


10- 


-3 


0.993 


10" 


3 


41 


25 


0.648 


10" 


-3 


0.118 


10- 


-2 


0.172 


10" 


-2 


45 


36 


0.929 


1Q- 


-2 


0.193 


10- 


-2 


0.262 


10" 


-2 


50 


49 


0.133 


10- 


-2 


0.185 


10- 


-2 


0.314 


10- 


-2 


55 


64 


0.178 


10- 


-2 


0.179 


10- 


-2 


0.357 


10- 


-2 


100 


300 


0.317 


10- 


-2 


0.428 


10" 


-2 


0.735 


10" 


-2 


150 


750 


0.445 


10" 


-2 


0.450 


10" 


-2 


0.894 


10" 


-2 



predict the limiting values of a and (3 for a large number of events in case 
of other statements about statistic s\ (Table 4) or any other estimator. 

Right column in Tables 2, 3 and 4 contains the value of probability k 0|. 
The k is a characteristic of the observability of Phenomenon for the given 
N s and Nb. In particular, it is the fraction of p.d.f. fo(x) for statistic x that 
can be described by the fluctuation of background in case of the absence of 
Phenomenon. The value of k equals the area of intersection of probability 
density functions fo(x) and fi(x) (Fig.l). Clearly, if we superimpose the 
p.d.f. 's fo(x) and fi(x) and choose the intersection point of curves (point 

N ev = f _ irr — 1) as a critical value for the hypotheses testing Fl we have 

L Zn(l + ^) J; 

k = a + (3. As is seen from Tables 2, 3 and 4 the value of k is also close to 
the sum a + /? determined by using the equal-tailed test. 

The accuracy of determination of the critical value by Monte Carlo cal- 
culations depends on the number of Monte Carlo trials and on the level 
of significance defined by the critical value. To illustrate, Fig. 7 shows the 

4 Notice that in this point fo{N ev ) — fi(N ev ) (in our case conditions min(fo(N ev ) — 
fi(N ev )) and fi(N ev ) < fo(N ev ) are performed). By this is meant that this checking can 
be named as the equal probability test. Of course, if we use the hypotheses testing we 
can also determine N ev having found the minimum of the sum of a and /3 or having found 
the minimum of the sum of weighted a and /3 or having exploited any other condition in 
accordance with the requirements of experiment. The k may be thought of as independing 
of these requirements. 
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distribution of the estimations of the value — - — for the case N s = 100, 

Nb = 500 and for the 10 5 Monte Carlo trials in each estimation (equal-tailed 
test is used). The result obtained via the direct calculations of p.d.f.'s is also 
shown in this Figure. Thus, this method is accurate enough to give reliable 
results for estimation of the discovery potential of the experiment. 

The approach to the determination of the critical region in the hypothe- 
ses testing by Monte Carlo calculation of p.d.f.'s can be used to estimate the 
integrated luminosity which is necessary for detection the predicted effects 
with sufficient accuracy. In Fig. 8 (a) the dependence of N ev on integrated 
luminosity (§, Table.12, cut. 5, m Xl = 85 GeV, N s = 45, N b = 45) is 
shown. The corresponding values of a and (3 are presented in Fig. 8 (b). As 
evident from Figure the integrated luminosity L = 8 • 10 4 p6 _1 is sufficient 
to detect sleptons under the requirement that the probability k ~ a + (3 less 
than 1%. 

Conclusion 

In this paper the discussion on the observation of new Phenomenon is 
restricted to the testing of simple hypotheses in case of the predicted val- 
ues N s and Nf, and the observed value x. As is stressed in ||, the precise 
hypothesis testing should not be done by forming a traditional confidence 
interval and simply checking whether or not the precise hypothesis is com- 
patible with the confidence interval. A confidence interval || is usually of 
considerable importance in determining where the unknown parameter is 
likely to be, given that the alternative hypothesis is true, but it is not useful 
in determining whether or not a precise null hypothesis is true. 

To compare several statistics used for the hypotheses testing, we em- 
ploy the method that allows one to construct the rejection regions via the 
determination the probability density functions of these statistics by Monte 
Carlo calculations. As is shown, the considered statistics have close values of 
power for the specified significance level and can be used for the hypotheses 
testing in an equivalent manner. Also, it has been shown that the estima- 
tions of Type I and Type II errors obtained by this method have a reasonable 
accuracy. The method was used to make the inferences on the observability 
of some predicted phenomena. 
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Figure 1: The probability density functions fo(x) (a) and fi(x) (b) for 
the case of 51 signal events and 53 background events obtained by direct 
calculations of the probabilities. 
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Figure 2: The dependence of Type I a and Type II (3 errors on N ev for the 
case of 51 signal events and 53 background events. 
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Figure 3: The probability density functions fo(x) (a) and fi(x) (b) for the 
case of 51 signal events and 53 background events obtained by Monte Carlo 
simulation. 
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Figure 4: The probability density functions fo(x) (a) and fi(x) (b) of statis- 
tic S2- The dependence of Type I and Type II errors on critical value C2 (c) 
for the case of 51 signal events and 53 background events. 
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Figure 5: The probability density functions fo(x) (a) and fi(x) (b) of statis- 
tic s\. The dependence of Type I and Type II errors on critical value c\ (c) 
for the case of 5 signal events and 1 background events. 
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Figure 6: The probability density functions fo(x) (a) and fi(x) (b) of statis- 
tic s\. The dependence of Type I and Type II errors on critical value c\ (c) 
for the case of 5000 signal events and 10 6 background events. 



16 



Table 4: The dependence of a and (3 determined by using equal-tailed test 
on N s and TV;, for Si = 2, Si = 3, Si = 4, Si = 6 and Si = 8. The n is the 
area of intersection of probability density functions fo(x) and fi(x). 



Si 


N s 


N b 


a 




K 


2 


2 
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0.199 


0.265 


0.4634 




4 
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0.192 


0.216 


0.4061 




6 
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Figure 7: The variation of — - — in the equal-tailed hypotheses testing 
(N s = 100, N b = 500 and N s = 0, N h = 500 in 40 Monte Carlo simulations 
of probability density functions). 
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Figure 8: The dependence of the critical value -/V e „ (a), Type I and Type II 
errors (b) on integrated luminosity L for the case N s = JVj, and N s = 45 for 
L = 10 5 pb~ l (equal-tailed test). 
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